PS-DBSCAN: An Efficient Parallel DBSCAN Algorithm Based on Platform Of AI (PAI)
نویسندگان
چکیده
We present PS-DBSCAN, a communication ecient parallel DBSCAN algorithm that combines the disjoint-set data structure and Parameter Server framework in Platform of AI (PAI). Since data points within the same cluster may be distributed over dierent workers which result in several disjoint-sets, merging them incurs large communication costs. In our algorithm, we employ a fast global union approach to union the disjoint-sets to alleviate the communication burden. Experiments over the datasets of dierent scales demonstrate that PS-DBSCAN outperforms the PDSDBSCAN with 2-10 times speedup on communication eciency. We have released our PS-DBSCAN in an algorithm platform called Platform of AI (PAI) 1 in Alibaba Cloud. We have also demonstrated how to use the method in PAI.
منابع مشابه
Research on the Parallelization of the DBSCAN Clustering Algorithm for Spatial Data Mining Based on the Spark Platform
Density-based spatial clustering of applications with noise (DBSCAN) is a density-based clustering algorithm that has the characteristics of being able to discover clusters of any shape, effectively distinguishing noise points and naturally supporting spatial databases. DBSCAN has been widely used in the field of spatial data mining. This paper studies the parallelization design and realization...
متن کاملبررسی مشکلات الگوریتم خوشه بندی DBSCAN و مروری بر بهبودهای ارائهشده برای آن
Clustering is an important knowledge discovery technique in the database. Density-based clustering algorithms are one of the main methods for clustering in data mining. These algorithms have some special features including being independent from the shape of the clusters, highly understandable and ease of use. DBSCAN is a base algorithm for density-based clustering algorithms. DBSCAN is able to...
متن کاملImprovement of density-based clustering algorithm using modifying the density definitions and input parameter
Clustering is one of the main tasks in data mining, which means grouping similar samples. In general, there is a wide variety of clustering algorithms. One of these categories is density-based clustering. Various algorithms have been proposed for this method; one of the most widely used algorithms called DBSCAN. DBSCAN can identify clusters of different shapes in the dataset and automatically i...
متن کاملAn Efficient Density-based Clustering Algorithm for Higher-Dimensional Data
DBSCAN is a typically used clustering algorithm due to its clustering ability for arbitrarily-shaped clusters and its robustness to outliers. Generally, the complexity of DBSCAN is O(n) in the worst case, and it practically becomes more severe in higher dimension. Grid-based DBSCAN is one of the recent improved algorithms aiming at facilitating efficiency. However, the performance of grid-based...
متن کاملExperiments in Parallel Clustering with DBSCAN
We present a new result concerning the parallelisation of DBSCAN, a Data Mining algorithm for density-based spatial clustering. The overall structure of DBSCAN has been mapped to a skeletonstructured program that performs parallel exploration of each cluster. The approach is useful to improve performance on high-dimensional data, and is general w.r.t. the spatial index structure used. We report...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1711.01034 شماره
صفحات -
تاریخ انتشار 2017